On-line glossary compilation
نویسنده
چکیده
Nowadays, the development of the Internet has created massive amounts of documents available on-line. This not only gives us the opportunity to access various sources of knowledge, but also raises an issue of how to efficiently exploit this knowledge. In this project, we present a system to assist users in extracting and clarifying terms from free-form documents. We developed Glossary Compiler, a system that automatically analyzes web documents in order to extract terms and their definitions. The purpose of an automatic glossary compiler is to aid in the construction of a list of definitions across a large collection of documents. Contrary to the basic search functionality, where the goal is to find single or multiple occurrences of a term or a set of terms, glossary is used to pinpoint definitions. Definition is a concise description of what an entity is. Therefore, glossary compiler should perform some basic semantic analysis to distinguish simple occurrences of a term from its actual definitions. There are several challenges to consider. First challenge is due to multiple ways to phrase a definition. Second challenge occurs when a single term has multiple definitions and it is necessary to cluster them according to the category each definition belongs to. Ideally, everyone can benefit from an on-line glossary compiler, whether it's an ordinary user, who wants to define an unknown term or a scholar, working with a large collection of papers, which contain the definitions of terms. The major advantage of an on-line glossary compiler is twofold. On one hand, an on-line definition extraction tool can leverage the diversity of content on the Web and have immediate access to a considerably larger set of definitions. Each definition can view the defined term from a different or sometimes unexpected perspective. On the other hand, manual compilation of conventional paper glossaries takes significant amount of time and a user interested in finding the definition of a new term may simply not find it in paper glossaries. Since new information appears considerably faster on the Web than in paper sources, automatically compiled glossaries provide access to definitions at the same time they appear on the Web. However, while automatic identification and extraction of terms from text document have been widely studied in the linguistic literature, the automatic definition extraction problem is much less studied. As opposed to previous similar projects, e.g. [3], which essentially were based on bag-of-words treatment of the text and hypertext markup heuristics, in this project we proposed a novel approach to definition extraction by looking at it as a search for subtrees in large set of trees. Our approach has strong foundation in theoretical linguistics.
منابع مشابه
Glossary of reference terms for alternative test methods and their validation.
This glossary was developed to provide technical references to support work in the field of the alternatives to animal testing. It was compiled from various existing reference documents coming from different sources and is meant to be a point of reference on alternatives to animal testing. Giving the ever-increasing number of alternative test methods and approaches being developed over the last...
متن کاملEvaluation of the govstat statistical interactive glossary: implications for just-in-time help
The GovStat Statistical Interactive Glossary (SIG) is intended to allow users of federal statistical agency websites to look up meanings of statistical terms they encounter on the websites without interrupting their tasks. This kind of just-in-time, just-inplace help is one approach to integrating help with users’ tasks as seamlessly as possible. We discuss implications of an evaluation study o...
متن کاملCross-Cultural Research: An Introduction for Students
Outline 1 What is Cross-Cultural Research? Cultural coherence or decoherence within and between human communities: human behavior, beliefs, and institutions 2 A Course in Cross-Cultural Research 3 Goals and Outcome 4 Tools: Spss; Maps and MapTab; Statistics for Galton’s Problem 5 Topics and Terms: Lists of Topics; Files; References; Glossary 6 Resources: e.g., On-line articles, e.g., JSTOR “Pol...
متن کاملCompilation of a Multilingual (Spanish / English / French / Portuguese) Glossary of Rural Tourism Terms of Castile and Leon
Our aim is to give an account of the process carried out in the compilation of a multilingual lexicon of rural tourism terms. This lexicon provides equivalents of Spanish local culturally-loaded terms in English, French and Portuguese, the languages spoken by the vast majority of visitors to Castile and Leon. This tool will contribute to improve the communication in the catering industry in thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006